We adopt a top-down approach to simulate hierarchical networks, considering various simulation parameters such as graph sparsity, noise, and the architecture of the super-level graph(s), including small-world, scale-free, and random graph networks (Watts and Strogatz 1998; Barabási and Bonabeau 2003).
Our simulations focus on basic hierarchies comprising one or two hierarchical layers. Two-layer networks mirror classical community detection on graphs, where our aim is to recover the true community labels from a given graph. Meanwhile, three-layer networks present a more intricate scenario, where the bottom layer of the hierarchy contains two levels of community structure. Here, the top level corresponds to the nodes at the uppermost layer of the hierarchy, and the middle level consists of communities nested within the top-level communities. The objective with these networks is to identify both sets of community partitions.
In each hierarchy, for fully connected networks, we initiate by simulating \(n_{\text{top}}\) top-level nodes, adhering to a directed small-world, random graph, or scale-free network architecture (Watts and Strogatz 1998; Barabási and Bonabeau 2003). In cases where the network is disconnected, we simply simulate \(n_{\text{top}}\) disconnected nodes. For networks with three hierarchical layers, we then generate a subnetwork of \(n_{\text{middle}}\) nodes from each top-layer node, adhering to the network structure utilized at the top level. If the network is fully connected, we apply a probability \(p_\text{between}\) to the nodes from different top-level communities being connected.
The final step in all hierarchies is to generate the nodes in the observed (bottom) layer of the hierarchy. For each top-layer or middle-layer node, we generate a subnetwork of \(n_{\text{bottom}}\) nodes under the same subnetwork structure as the previous layers, and we apply a probability \(p_\text{between}\) for nodes from different communities to share an edge.
Once we simulate a hierarchical graph, we utilize this hierarchy to generate the node-feature matrix, which depicts the expression of \(N\) genes across \(p\) samples. Here, \(N\) denotes the number of nodes in the observed (bottom) layer of the hierarchy, and its range is governed by \(a^{\ell+1}<N<a\times b^\ell\), where \(\ell\) signifies the number of hierarchical layers.
We simulate the node-feature matrix using the topological order the observed level graph. We start by generating the features of nodes that have no parental input. We refer to these nodes as origin nodes. All origin nodes are simulated from a normal distribution with mean \(0\) and standard deviation \(\sigma\). All other nodes are simulated from a normal distribution centered at the mean of their parent nodes and with standard deviation \(\sigma\).
Our HCD method consists of two primary components:
A graph autoencoder based on the architecture proposed by Salehi and Davulcu (2019) which utilizes graph attention layers such as those first indroduced by Veličković et al. (2017) (See most recent version of pseudocode for details). In our applications, we incorporate multi-head attention in all encoder and decoder layers to expand model learning capacity. The graph autoencoder module takes a set of node attributes and an adjacency matrix defining the relationships between the node as input and learns a low dimensional embedding of the network and attributes. This embedding is then used to reconstruct the node attributes and adjacency matrix under a separate loss function for each.
The second component of HCD takes the embeddings generated by the autoencoder and applies a multilevel community detection process. This module is composed of \(l\) fully connected layers, each representing a level in the hierarchy. Each layer’s goal is to group the \(k_{i-1}\) nodes from the previous level into \(k_i\) communities. The number of layers in the hierarchy and the number of communities at each level are predefined parameters that need to be determined through other methods. In our applications, we use the simulation truth for each parameter so that the communitiy detection module consists of \(2\) layers where the first layer assigns the nodes at the bottom layer to the true number of communities at the middle layer. The second community detection layer assigns the nodes in the middle layer to communities in the top layer.
We consider three sets of hierarchical networks which represent varying difficulty levels for inference:
A comprehensive overview of the intermediate networks is presented in Table 1.4. These networks are structured as three-layered systems, each characterized by small-world, scale-free, or random graph architectures. In contrast to the more intricate networks featured in the Complex Networks dataset, the intermediate networks exhibit a comparatively simpler configuration. Specifically, each network comprises \(5\) super layer nodes, \(15\) middle layer nodes, and \(300\) bottom layer nodes. Our primary focus in utilizing this dataset is to examine the performance of the Hierarchical Community Detection (HCD) method when applied to three-layer networks. The smaller scale of these networks facilitates a more in-depth analysis of the detected communities within the middle and upper layers of their hierarchical structures.
We apply the HCD method to each network separately using three options for the input graph corresponding to the nodes at the observed layer of the hierarchy:
The input graph is the true graph
The input graph is the correlation matrix of the simulated gene expression
The input graph is the correlation matrix of the simulated gene expression wherein correlations weaker than 0.2 are disregarded and set to zero
The input graph is the correlation matrix of the simulated gene expression wherein correlations weaker than 0.5 are disregarded and set to zero
The input graph is the correlation matrix of the simulated gene expression wherein correlations weaker than 0.7 are disregarded and set to zero
We also explore various combinations of weighting the loss function across each of the aforementioned input graphs. In all cases, we ensure that the predicted number of communities in the middle or top levels of the hierarchy aligns with the ground truth of the simulation.
We evaluate the performance of our HCD method using three graph-based clustering metrics:
homogeneity evaluates the degree to which each predicted community contains only data points from a single true community, indicating how well the algorithm avoids mixing different groups. Thus, homogenity tends to be high if resolved communities contain only members of the same true community.
completeness assesses the extent to which all data points that belong to the same true community are correctly assigned to a single predicted community. Thus completeness is always high if all members of the same true communities end up in the same resolved community even if several true communities are allocated together.
NMI is a weighted average of the previous two metrics.
For each simulation, we configure the number of communities in the middle and upper layers of the hierarchy to match the true count in each layer. Then, we evaluate the community predictions of the Hierarchical Community Detection (HCD) algorithm at these levels against the actual communities using three metrics. As a baseline, we employ the Louvain method, which utilizes hierarchical graph partitioning to maximize modularity, resulting in a single set of resolved communities. These resolved communities may align with the middle, upper, or a combination of both layers in the true hierarchy. Thus, we compute the performance metrics of the communities identified by the Louvain method against the true communities at both the upper and middle levels of the hierarchy.
Our previous reports have outlined that the HCD model can accurately capture at least one level in the hierarchy as well as the Louvain method and the HCD method showed improved capability when the input graph was an estimate of the true adjacency matrix. However, only in one of the examples (Example 12 Report 6/5/2024) we investigated did HCD accurately predict the communities at both the top and middle levels. Moreover, closer investigation of the true graph and data indicates that the data often has clear community structure at one level in the hierarchy. For example, in the 3-layer scale free network, community structure is very apparent at the middle level of the hierarchy but the top level community structure is less obvious in the correlations or adjacency matrix 1.1 - 1.2. In our examples dealing with scale free networks, HCD and the Louvain method typically accurately predicted the middle layer with HCD failing to find the community groupings at the top layer (Figure 2.9 Report 6/5/2024).
# -*- coding: utf-8 -*-
"""
Created on Sun Jun 23 15:21:08 2024
@author: Bruin
"""
## '\nCreated on Sun Jun 23 15:21:08 2024\n\n@author: Bruin\n'
import random
import torch
import torch.nn as nn
import torch.nn.functional as F
import argparse
import numpy as np
import networkx as nx
import seaborn as sbn
import matplotlib.pyplot as plt
import sys
import pandas as pd
#sys.path.append('/mnt/ceph/jarredk/scGNN_for_genes/HC-GNN/')
#sys.path.append('/mnt/ceph/jarredk/HGRN_repo/Simulated Hierarchies/')
#sys.path.append('/mnt/ceph/jarredk/HGRN_repo/HGRN_software/')
#sys.path.append('/mnt/ceph/jarredk/scGNN_for_genes/gen_data')
#sys.path.append('C:/Users/Bruin/Documents/GitHub/scGNN_for_genes/gen_data')
#sys.path.append('C:/Users/Bruin/Documents/GitHub/scGNN_for_genes/HC-GNN/')
sys.path.append('C:/Users/Bruin/Documents/GitHub/HGRN_repo/Simulated Hierarchies/')
sys.path.append('C:/Users/Bruin/Documents/GitHub/HGRN_repo/HGRN_software/')
from simulation_software.Simulate import simulate_graph
from simulation_software.simulation_utilities import compute_graph_STATs, sort_labels
from model.utilities import get_input_graph
#import os
#os.chdir('C:/Users/Bruin/Documents/GitHub/HGRN_repo/Bethe Hessian Tests/')
import warnings
warnings.filterwarnings('ignore')
parser = argparse.ArgumentParser()
# general
import random as rd
from itertools import product
from tqdm import tqdm
import time
from sklearn.manifold import TSNE
import pickle
from sklearn.decomposition import PCA
from model.model_layer import gaeGAT_layer as GAT
from model.model import GATE, CommClassifer, HCD
from model.train import CustomDataset, batch_data, fit
from simulation_software.simulation_utilities import compute_modularity, post_hoc_embedding, compute_beth_hess_comms
from model.utilities import resort_graph, trace_comms, node_clust_eval, gen_labels_df, LoadData, build_true_graph, get_input_graph, plot_nodes, plot_clust_heatmaps
import seaborn as sbn
import matplotlib.pyplot as plt
from community import community_louvain as cl
from itertools import product, chain
from tqdm import tqdm
import pdb
import ast
import random as rd
from sklearn.manifold import TSNE
import pickle
from sklearn.decomposition import PCA
# rd.seed(123)
# torch.manual_seed(123)
# simulation default arguments
parser.add_argument('--connect', dest='connect', default='disc', type=str)
## _StoreAction(option_strings=['--connect'], dest='connect', nargs=None, const=None, default='disc', type=<class 'str'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--connect_prob', dest='connect_prob', default='use_baseline', type=str)
## _StoreAction(option_strings=['--connect_prob'], dest='connect_prob', nargs=None, const=None, default='use_baseline', type=<class 'str'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--toplayer_connect_prob', dest='toplayer_connect_prob', default=0.3, type=float)
## _StoreAction(option_strings=['--toplayer_connect_prob'], dest='toplayer_connect_prob', nargs=None, const=None, default=0.3, type=<class 'float'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--top_layer_nodes', dest='top_layer_nodes', default=5, type=int)
## _StoreAction(option_strings=['--top_layer_nodes'], dest='top_layer_nodes', nargs=None, const=None, default=5, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--subgraph_type', dest='subgraph_type', default='small world', type=str)
## _StoreAction(option_strings=['--subgraph_type'], dest='subgraph_type', nargs=None, const=None, default='small world', type=<class 'str'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--subgraph_prob', dest='subgraph_prob', default=0.05, type=float)
## _StoreAction(option_strings=['--subgraph_prob'], dest='subgraph_prob', nargs=None, const=None, default=0.05, type=<class 'float'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--nodes_per_super2', dest='nodes_per_super2', default=(3,3), type=tuple)
## _StoreAction(option_strings=['--nodes_per_super2'], dest='nodes_per_super2', nargs=None, const=None, default=(3, 3), type=<class 'tuple'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--nodes_per_super3', dest='nodes_per_super3', default=(20,20), type=tuple)
## _StoreAction(option_strings=['--nodes_per_super3'], dest='nodes_per_super3', nargs=None, const=None, default=(20, 20), type=<class 'tuple'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--node_degree_middle', dest='node_degree_middle', default=5, type=int)
## _StoreAction(option_strings=['--node_degree_middle'], dest='node_degree_middle', nargs=None, const=None, default=5, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--node_degree_bottom', dest='node_degree_bottom', default=5, type=int)
## _StoreAction(option_strings=['--node_degree_bottom'], dest='node_degree_bottom', nargs=None, const=None, default=5, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--sample_size',dest='sample_size', default = 500, type=int)
## _StoreAction(option_strings=['--sample_size'], dest='sample_size', nargs=None, const=None, default=500, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--layers',dest='layers', default = 2, type=int)
## _StoreAction(option_strings=['--layers'], dest='layers', nargs=None, const=None, default=2, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--SD',dest='SD', default = 0.1, type=float)
## _StoreAction(option_strings=['--SD'], dest='SD', nargs=None, const=None, default=0.1, type=<class 'float'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--common_dist', dest='common_dist',default = True, type=bool)
## _StoreAction(option_strings=['--common_dist'], dest='common_dist', nargs=None, const=None, default=True, type=<class 'bool'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--seed_number', dest='seed_number',default = None, type=int)
## _StoreAction(option_strings=['--seed_number'], dest='seed_number', nargs=None, const=None, default=None, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--within_edgeweights', dest='within_edgeweights',default = (0.4, 0.7), type=tuple)
## _StoreAction(option_strings=['--within_edgeweights'], dest='within_edgeweights', nargs=None, const=None, default=(0.4, 0.7), type=<class 'tuple'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--between_edgeweights', dest='between_edgeweights',default = (0, 0.3), type=tuple)
## _StoreAction(option_strings=['--between_edgeweights'], dest='between_edgeweights', nargs=None, const=None, default=(0, 0.3), type=<class 'tuple'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--use_weighted_graph', dest='use_weighted_graph',default = False, type=bool)
## _StoreAction(option_strings=['--use_weighted_graph'], dest='use_weighted_graph', nargs=None, const=None, default=False, type=<class 'bool'>, choices=None, required=False, help=None, metavar=None)
args = parser.parse_args()
args.connect_prob = 0.01
args.common_dist = False
args.subgraph_prob = 0.1
args.SD = 0.1
args.node_degree_bottom = 3
args.node_degree_middle = 3
args.force_connect = True
args.layers = 3
args.connect = 'disc'
args.subgraph_type = 'small world'
args.use_weighted_graph = False
#args.savepath = 'C:/Users/Bruin/Documents/GitHub/HGRN_repo/Simulated Hierarchies/DATA/Toy_examples/Intermediate_examples/Results/test/Single_sim_dpd_identity/'
args.savepath = 'C:/Users/Bruin/Documents/GitHub/HGRN_repo/Simulated Hierarchies/DATA/Toy_examples/Intermediate_examples/Results/test/'
#args.savepath = 'C:/Users/Bruin/Documents/GitHub/HGRN_repo/Reports/Report_4_15_2024/example7_output/'
def run_simulations(args, use_gpu = False):
device = 'cuda:'+str(0) if use_gpu and torch.cuda.is_available() else 'cpu'
print('*********** using DEVICE: {} **************'.format(device))
pe, gexp, nodes, edges, nx_all, adj_all, args.savepath, nodelabs, orin = simulate_graph(args)
true_adj_undi = build_true_graph(args.savepath+'.npz')
pdadj = pd.DataFrame(adj_all[-1])
pdadj.to_csv(args.savepath+'example_adjacency_matrix.csv')
in_graph05, in_adj05 = get_input_graph(X = pe,
method = 'Correlation',
r_cutoff = 0.5)
pdadj2 = pd.DataFrame(in_adj05)
pdadj2.to_csv(args.savepath+'example_adjacency_matrix_corrgraph.csv')
indices_top, indices_mid, labels_df, sorted_true_labels_top, sorted_true_labels_middle = sort_labels(nodelabs)
#C = F.one_hot(torch.Tensor(sorted_true_labels_top).to(torch.int64)).to(torch.float32)
#X = torch.Tensor(pe_sorted).requires_grad_()
origin_nodes = [i[0] for i in orin]
if args.layers > 2:
pe_sorted = pe[indices_mid,:]
idx = [indices_mid.index(i) for i in origin_nodes]
else:
pe_sorted = pe[indices_top,:]
idx = [indices_mid.index(i) for i in origin_nodes]
fig, ax = plt.subplots(figsize = (12, 10))
sbn.heatmap(np.corrcoef(pe_sorted))
plt.show()
labels = [sorted_true_labels_middle, sorted_true_labels_top]
TSNE_embed=TSNE(n_components=3,
learning_rate='auto',
init='random',
perplexity=3).fit_transform(pe_sorted)
PCs = PCA(n_components=3).fit_transform(pe_sorted)
which_labels = ['Middle', 'Top']
fig, ax = plt.subplots(2,2, figsize = (10,10))
for i in range(0, len(labels)):
ax[0][i].scatter(TSNE_embed[:,0], TSNE_embed[:,1],
s = 150.0, c = labels[i], cmap = 'plasma')
ax[0][i].scatter(TSNE_embed[idx,0], TSNE_embed[idx,1],
c='red', s = 300, marker = '.')
ax[0][i].set_xlabel('Dimension 1')
ax[0][i].set_ylabel('Dimension 2')
ax[0][i].set_title( 't-SNE '+which_labels[i])
#adding node labels
ax[1][i].scatter(PCs[:,0], PCs[:,1],
s = 150.0, c = labels[i], cmap = 'plasma')
ax[1][i].scatter(PCs[idx,0], PCs[idx,1],
c='red', s = 300, marker = '.')
ax[1][i].set_xlabel('Dimension 1')
ax[1][i].set_ylabel('Dimension 2')
ax[1][i].set_title( 'PCA '+which_labels[i])
plt.show()
fig3d = plt.figure(figsize=(12,10))
ax3d = plt.axes(projection='3d')
ax3d.scatter3D(PCs[:,0], PCs[:,1], PCs[:,2],
c=labels[1], cmap='plasma')
ax3d.scatter3D(PCs[idx,0], PCs[idx,1], PCs[idx,2],
c='red', s = 300, marker = '.',
depthshade = False)
ax3d.set_xlabel('Dimension 1')
ax3d.set_ylabel('Dimension 2')
ax3d.set_zlabel('Dimension 3')
plt.show()
fig3d = plt.figure(figsize=(12,10))
ax3d = plt.axes(projection='3d')
ax3d.scatter3D(TSNE_embed[:,0], TSNE_embed[:,1], TSNE_embed[:,2],
c=labels[0], cmap='plasma')
ax3d.scatter3D(TSNE_embed[idx,0], TSNE_embed[idx,1], TSNE_embed[idx,2],
c='red', s = 300, marker = '.',
depthshade = False)
ax3d.set_xlabel('Dimension 1')
ax3d.set_ylabel('Dimension 2')
ax3d.set_zlabel('Dimension 3')
plt.show()
run_simulations(args)
## *********** using DEVICE: cpu **************
## ------------------------------------------------------------
## Number of edges: 0
## Number of nodes: 5
## Mean In degree: 0.0
## Mean Out degree: 0.0
## ------------------------------------------------------------
## small world subgraphs used
## ------------------------------------------------------------
## Number of edges: 15
## Number of nodes: 15
## In degree: 1.0
## Out degree: 1.0
## ------------------------------------------------------------
## small world subgraphs used
## ------------------------------------------------------------
## Number of edges: 363
## Number of nodes: 300
## In degree: 1.21
## Out degree: 1.21
## ------------------------------------------------------------
## 300 (300, 300)
## Generating pseudoexpression...
## data dimension = (300, 500)
## (500, 300)
## number of layers detected = 3.0, items in file = ['layer1', 'adj_layer1', 'layer2', 'adj_layer2', 'layer3', 'adj_layer3', 'gen_express', 'labels']
## layer = 2, Adj Splice = [300, 315]
## layer = 1, Adj Splice = [315, 320]
# -*- coding: utf-8 -*-
"""
Created on Sun Jun 23 15:21:08 2024
@author: Bruin
"""
## '\nCreated on Sun Jun 23 15:21:08 2024\n\n@author: Bruin\n'
import random
import torch
import torch.nn as nn
import torch.nn.functional as F
import argparse
import numpy as np
import networkx as nx
import seaborn as sbn
import matplotlib.pyplot as plt
import sys
import pandas as pd
#sys.path.append('/mnt/ceph/jarredk/scGNN_for_genes/HC-GNN/')
#sys.path.append('/mnt/ceph/jarredk/HGRN_repo/Simulated Hierarchies/')
#sys.path.append('/mnt/ceph/jarredk/HGRN_repo/HGRN_software/')
#sys.path.append('/mnt/ceph/jarredk/scGNN_for_genes/gen_data')
#sys.path.append('C:/Users/Bruin/Documents/GitHub/scGNN_for_genes/gen_data')
#sys.path.append('C:/Users/Bruin/Documents/GitHub/scGNN_for_genes/HC-GNN/')
sys.path.append('C:/Users/Bruin/Documents/GitHub/HGRN_repo/Simulated Hierarchies/')
sys.path.append('C:/Users/Bruin/Documents/GitHub/HGRN_repo/HGRN_software/')
from simulation_software.Simulate import simulate_graph
from simulation_software.simulation_utilities import compute_graph_STATs, sort_labels
from model.utilities import get_input_graph
#import os
#os.chdir('C:/Users/Bruin/Documents/GitHub/HGRN_repo/Bethe Hessian Tests/')
import warnings
warnings.filterwarnings('ignore')
parser = argparse.ArgumentParser()
# general
import random as rd
from itertools import product
from tqdm import tqdm
import time
from sklearn.manifold import TSNE
import pickle
from sklearn.decomposition import PCA
from model.model_layer import gaeGAT_layer as GAT
from model.model import GATE, CommClassifer, HCD
from model.train import CustomDataset, batch_data, fit
from simulation_software.simulation_utilities import compute_modularity, post_hoc_embedding, compute_beth_hess_comms
from model.utilities import resort_graph, trace_comms, node_clust_eval, gen_labels_df, LoadData, build_true_graph, get_input_graph, plot_nodes, plot_clust_heatmaps
import seaborn as sbn
import matplotlib.pyplot as plt
from community import community_louvain as cl
from itertools import product, chain
from tqdm import tqdm
import pdb
import ast
import random as rd
from sklearn.manifold import TSNE
import pickle
from sklearn.decomposition import PCA
# rd.seed(123)
# torch.manual_seed(123)
# simulation default arguments
parser.add_argument('--connect', dest='connect', default='disc', type=str)
## _StoreAction(option_strings=['--connect'], dest='connect', nargs=None, const=None, default='disc', type=<class 'str'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--connect_prob', dest='connect_prob', default='use_baseline', type=str)
## _StoreAction(option_strings=['--connect_prob'], dest='connect_prob', nargs=None, const=None, default='use_baseline', type=<class 'str'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--toplayer_connect_prob', dest='toplayer_connect_prob', default=0.3, type=float)
## _StoreAction(option_strings=['--toplayer_connect_prob'], dest='toplayer_connect_prob', nargs=None, const=None, default=0.3, type=<class 'float'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--top_layer_nodes', dest='top_layer_nodes', default=5, type=int)
## _StoreAction(option_strings=['--top_layer_nodes'], dest='top_layer_nodes', nargs=None, const=None, default=5, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--subgraph_type', dest='subgraph_type', default='small world', type=str)
## _StoreAction(option_strings=['--subgraph_type'], dest='subgraph_type', nargs=None, const=None, default='small world', type=<class 'str'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--subgraph_prob', dest='subgraph_prob', default=0.05, type=float)
## _StoreAction(option_strings=['--subgraph_prob'], dest='subgraph_prob', nargs=None, const=None, default=0.05, type=<class 'float'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--nodes_per_super2', dest='nodes_per_super2', default=(3,3), type=tuple)
## _StoreAction(option_strings=['--nodes_per_super2'], dest='nodes_per_super2', nargs=None, const=None, default=(3, 3), type=<class 'tuple'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--nodes_per_super3', dest='nodes_per_super3', default=(20,20), type=tuple)
## _StoreAction(option_strings=['--nodes_per_super3'], dest='nodes_per_super3', nargs=None, const=None, default=(20, 20), type=<class 'tuple'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--node_degree_middle', dest='node_degree_middle', default=5, type=int)
## _StoreAction(option_strings=['--node_degree_middle'], dest='node_degree_middle', nargs=None, const=None, default=5, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--node_degree_bottom', dest='node_degree_bottom', default=5, type=int)
## _StoreAction(option_strings=['--node_degree_bottom'], dest='node_degree_bottom', nargs=None, const=None, default=5, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--sample_size',dest='sample_size', default = 500, type=int)
## _StoreAction(option_strings=['--sample_size'], dest='sample_size', nargs=None, const=None, default=500, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--layers',dest='layers', default = 2, type=int)
## _StoreAction(option_strings=['--layers'], dest='layers', nargs=None, const=None, default=2, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--SD',dest='SD', default = 0.1, type=float)
## _StoreAction(option_strings=['--SD'], dest='SD', nargs=None, const=None, default=0.1, type=<class 'float'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--common_dist', dest='common_dist',default = True, type=bool)
## _StoreAction(option_strings=['--common_dist'], dest='common_dist', nargs=None, const=None, default=True, type=<class 'bool'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--seed_number', dest='seed_number',default = None, type=int)
## _StoreAction(option_strings=['--seed_number'], dest='seed_number', nargs=None, const=None, default=None, type=<class 'int'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--within_edgeweights', dest='within_edgeweights',default = (0.4, 0.7), type=tuple)
## _StoreAction(option_strings=['--within_edgeweights'], dest='within_edgeweights', nargs=None, const=None, default=(0.4, 0.7), type=<class 'tuple'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--between_edgeweights', dest='between_edgeweights',default = (0, 0.3), type=tuple)
## _StoreAction(option_strings=['--between_edgeweights'], dest='between_edgeweights', nargs=None, const=None, default=(0, 0.3), type=<class 'tuple'>, choices=None, required=False, help=None, metavar=None)
parser.add_argument('--use_weighted_graph', dest='use_weighted_graph',default = False, type=bool)
## _StoreAction(option_strings=['--use_weighted_graph'], dest='use_weighted_graph', nargs=None, const=None, default=False, type=<class 'bool'>, choices=None, required=False, help=None, metavar=None)
args = parser.parse_args()
args.connect_prob = 0.01
args.common_dist = False
args.subgraph_prob = 0.1
args.SD = 0.1
args.node_degree_bottom = 15
args.node_degree_middle = 3
args.force_connect = True
args.layers = 3
args.connect = 'disc'
args.subgraph_type = 'small world'
args.use_weighted_graph = False
#args.savepath = 'C:/Users/Bruin/Documents/GitHub/HGRN_repo/Simulated Hierarchies/DATA/Toy_examples/Intermediate_examples/Results/test/Single_sim_dpd_identity/'
args.savepath = 'C:/Users/Bruin/Documents/GitHub/HGRN_repo/Simulated Hierarchies/DATA/Toy_examples/Intermediate_examples/Results/test/'
#args.savepath = 'C:/Users/Bruin/Documents/GitHub/HGRN_repo/Reports/Report_4_15_2024/example7_output/'
def run_simulations(args, use_gpu = False):
device = 'cuda:'+str(0) if use_gpu and torch.cuda.is_available() else 'cpu'
print('*********** using DEVICE: {} **************'.format(device))
pe, gexp, nodes, edges, nx_all, adj_all, args.savepath, nodelabs, orin = simulate_graph(args)
true_adj_undi = build_true_graph(args.savepath+'.npz')
pdadj = pd.DataFrame(adj_all[-1])
pdadj.to_csv(args.savepath+'example_adjacency_matrix.csv')
in_graph05, in_adj05 = get_input_graph(X = pe,
method = 'Correlation',
r_cutoff = 0.5)
pdadj2 = pd.DataFrame(in_adj05)
pdadj2.to_csv(args.savepath+'example_adjacency_matrix_corrgraph.csv')
indices_top, indices_mid, labels_df, sorted_true_labels_top, sorted_true_labels_middle = sort_labels(nodelabs)
#C = F.one_hot(torch.Tensor(sorted_true_labels_top).to(torch.int64)).to(torch.float32)
#X = torch.Tensor(pe_sorted).requires_grad_()
origin_nodes = [i[0] for i in orin]
if args.layers > 2:
pe_sorted = pe[indices_mid,:]
idx = [indices_mid.index(i) for i in origin_nodes]
else:
pe_sorted = pe[indices_top,:]
idx = [indices_mid.index(i) for i in origin_nodes]
fig, ax = plt.subplots(figsize = (12, 10))
sbn.heatmap(np.corrcoef(pe_sorted))
plt.show()
labels = [sorted_true_labels_middle, sorted_true_labels_top]
TSNE_embed=TSNE(n_components=3,
learning_rate='auto',
init='random',
perplexity=3).fit_transform(pe_sorted)
PCs = PCA(n_components=3).fit_transform(pe_sorted)
which_labels = ['Middle', 'Top']
fig, ax = plt.subplots(2,2, figsize = (10,10))
for i in range(0, len(labels)):
ax[0][i].scatter(TSNE_embed[:,0], TSNE_embed[:,1],
s = 150.0, c = labels[i], cmap = 'plasma')
ax[0][i].scatter(TSNE_embed[idx,0], TSNE_embed[idx,1],
c='red', s = 300, marker = '.')
ax[0][i].set_xlabel('Dimension 1')
ax[0][i].set_ylabel('Dimension 2')
ax[0][i].set_title( 't-SNE '+which_labels[i])
#adding node labels
ax[1][i].scatter(PCs[:,0], PCs[:,1],
s = 150.0, c = labels[i], cmap = 'plasma')
ax[1][i].scatter(PCs[idx,0], PCs[idx,1],
c='red', s = 300, marker = '.')
ax[1][i].set_xlabel('Dimension 1')
ax[1][i].set_ylabel('Dimension 2')
ax[1][i].set_title( 'PCA '+which_labels[i])
plt.show()
fig3d = plt.figure(figsize=(12,10))
ax3d = plt.axes(projection='3d')
ax3d.scatter3D(PCs[:,0], PCs[:,1], PCs[:,2],
c=labels[1], cmap='plasma')
ax3d.scatter3D(PCs[idx,0], PCs[idx,1], PCs[idx,2],
c='red', s = 300, marker = '.',
depthshade = False)
ax3d.set_xlabel('Dimension 1')
ax3d.set_ylabel('Dimension 2')
ax3d.set_zlabel('Dimension 3')
plt.show()
fig3d = plt.figure(figsize=(12,10))
ax3d = plt.axes(projection='3d')
ax3d.scatter3D(TSNE_embed[:,0], TSNE_embed[:,1], TSNE_embed[:,2],
c=labels[0], cmap='plasma')
ax3d.scatter3D(TSNE_embed[idx,0], TSNE_embed[idx,1], TSNE_embed[idx,2],
c='red', s = 300, marker = '.',
depthshade = False)
ax3d.set_xlabel('Dimension 1')
ax3d.set_ylabel('Dimension 2')
ax3d.set_zlabel('Dimension 3')
plt.show()
run_simulations(args)
## *********** using DEVICE: cpu **************
## ------------------------------------------------------------
## Number of edges: 0
## Number of nodes: 5
## Mean In degree: 0.0
## Mean Out degree: 0.0
## ------------------------------------------------------------
## small world subgraphs used
## ------------------------------------------------------------
## Number of edges: 15
## Number of nodes: 15
## In degree: 1.0
## Out degree: 1.0
## ------------------------------------------------------------
## small world subgraphs used
## ------------------------------------------------------------
## Number of edges: 2171
## Number of nodes: 300
## In degree: 7.236666666666666
## Out degree: 7.236666666666666
## ------------------------------------------------------------
## 300 (300, 300)
## Generating pseudoexpression...
## data dimension = (300, 500)
## (500, 300)
## number of layers detected = 3.0, items in file = ['layer1', 'adj_layer1', 'layer2', 'adj_layer2', 'layer3', 'adj_layer3', 'gen_express', 'labels']
## layer = 2, Adj Splice = [300, 315]
## layer = 1, Adj Splice = [315, 320]
| Subgraph type | Connect. type | Layers | StDev. | Nodes per layer | Edges per layer | Subgraph prob. | Sample size | Modularity (top) | Avg. node degree top | Avg edges within communities (top) | Avg. edges between communities (top) | Modularity (middle) | Avg. node degree middle | Avg edges within communities (middle) | Avg edges between communities (middle) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| small world | disc | 2 | 0.1 | (10, 63) | (0, 63) | 0.05 | 500 | 0.90 | 1.00 | 6.3 | 0.00 | NA | NA | NA | NA |
| small world | disc | 2 | 0.5 | (10, 63) | (0, 63) | 0.05 | 500 | 0.90 | 1.00 | 6.3 | 0.00 | NA | NA | NA | NA |
| small world | disc | 3 | 0.1 | (10, 63, 1604) | (0, 63, 2011) | 0.05 | 500 | 0.90 | 1.25 | 201.1 | 0.00 | 0.76 | 1.25 | 24.82 | 0.11 |
| small world | disc | 3 | 0.5 | (10, 63, 1604) | (0, 63, 2031) | 0.05 | 500 | 0.90 | 1.27 | 203.1 | 0.00 | 0.76 | 1.27 | 24.97 | 0.12 |
| small world | full | 2 | 0.1 | (10, 63) | (45, 115) | 0.05 | 500 | 0.45 | 1.82 | 6.3 | 0.58 | NA | NA | NA | NA |
| small world | full | 2 | 0.5 | (10, 63) | (45, 109) | 0.05 | 500 | 0.48 | 1.73 | 6.3 | 0.51 | NA | NA | NA | NA |
| small world | full | 3 | 0.1 | (10, 63, 1604) | (45, 114, 1604) | 0.05 | 500 | 0.77 | 1.43 | 199.3 | 3.39 | 0.67 | 1.43 | 24.94 | 0.19 |
| small world | full | 3 | 0.5 | (10, 63, 1604) | (45, 111, 1604) | 0.05 | 500 | 0.77 | 1.44 | 201.3 | 3.28 | 0.66 | 1.44 | 24.87 | 0.19 |
| Subgraph type | Connect. type | Layers | StDev. | Nodes per layer | Edges per layer | Subgraph prob. | Sample size | Modularity (top) | Avg. node degree top | Avg edges within communities (top) | Avg. edges between communities (top) | Modularity (middle) | Avg. node degree middle | Avg edges within communities (middle) | Avg edges between communities (middle) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| scale free | disc | 2 | 0.1 | (10, 58) | (0, 74) | 0.05 | 500 | 0.89 | 1.28 | 7.4 | 0.00 | NA | NA | NA | NA |
| scale free | disc | 2 | 0.5 | (10, 58) | (0, 74) | 0.05 | 500 | 0.89 | 1.28 | 7.4 | 0.00 | NA | NA | NA | NA |
| scale free | disc | 3 | 0.1 | (10, 58, 1450) | (0, 74, 6700) | 0.05 | 500 | 0.89 | 4.62 | 670.0 | 0.00 | 0.91 | 4.62 | 107.07 | 0.15 |
| scale free | disc | 3 | 0.5 | (10, 58, 1450) | (0, 74, 6670) | 0.05 | 500 | 0.89 | 4.60 | 667.0 | 0.00 | 0.91 | 4.60 | 107.07 | 0.14 |
| scale free | full | 2 | 0.1 | (10, 58) | (45, 120) | 0.05 | 500 | 0.51 | 2.07 | 7.4 | 0.51 | NA | NA | NA | NA |
| scale free | full | 2 | 0.5 | (10, 58) | (45, 120) | 0.05 | 500 | 0.51 | 2.07 | 7.4 | 0.51 | NA | NA | NA | NA |
| scale free | full | 3 | 0.1 | (10, 58, 1450) | (45, 123, 1450) | 0.05 | 500 | 0.85 | 4.78 | 665.9 | 3.03 | 0.88 | 4.78 | 107.07 | 0.22 |
| scale free | full | 3 | 0.5 | (10, 58, 1450) | (45, 122, 1450) | 0.05 | 500 | 0.85 | 4.84 | 671.4 | 3.42 | 0.86 | 4.84 | 107.07 | 0.25 |
| Subgraph type | Connect. type | Layers | StDev. | Nodes per layer | Edges per layer | Subgraph prob. | Sample size | Modularity (top) | Avg. node degree top | Avg edges within communities (top) | Avg. edges between communities (top) | Modularity (middle) | Avg. node degree middle | Avg edges within communities (middle) | Avg edges between communities (middle) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| random graph | disc | 2 | 0.1 | (10, 45) | (0, 32) | 0.05 | 500 | 0.88 | 0.71 | 3.2 | 0.00 | NA | NA | NA | NA |
| random graph | disc | 2 | 0.5 | (10, 45) | (0, 32) | 0.05 | 500 | 0.88 | 0.71 | 3.2 | 0.00 | NA | NA | NA | NA |
| random graph | disc | 3 | 0.1 | (10, 45, 725) | (0, 32, 678) | 0.05 | 500 | 0.89 | 0.94 | 67.8 | 0.00 | 0.78 | 0.94 | 12.16 | 0.07 |
| random graph | disc | 3 | 0.5 | (10, 45, 725) | (0, 32, 665) | 0.05 | 500 | 0.88 | 0.92 | 66.5 | 0.00 | 0.80 | 0.92 | 12.22 | 0.06 |
| random graph | full | 2 | 0.1 | (10, 45) | (45, 77) | 0.05 | 500 | 0.31 | 1.71 | 3.2 | 0.50 | NA | NA | NA | NA |
| random graph | full | 2 | 0.5 | (10, 45) | (45, 77) | 0.05 | 500 | 0.31 | 1.71 | 3.2 | 0.50 | NA | NA | NA | NA |
| random graph | full | 3 | 0.1 | (10, 45, 725) | (45, 78, 725) | 0.05 | 500 | 0.76 | 1.04 | 65.5 | 1.10 | 0.70 | 1.04 | 12.18 | 0.10 |
| random graph | full | 3 | 0.5 | (10, 45, 725) | (45, 78, 725) | 0.05 | 500 | 0.72 | 1.09 | 65.7 | 1.48 | 0.67 | 1.09 | 12.16 | 0.12 |
| Subgraph type | Connect. type | Layers | StDev. | Nodes per layer | Edges per layer | Subgraph prob. | Sample size | Modularity (top) | Avg. node degree top | Avg edges within communities (top) | Avg. edges between communities (top) | Modularity (middle) | Avg. node degree middle | Avg edges within communities (middle) | Avg edges between communities (middle) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| small world | disc | 3 | 0.1 | (5, 15, 300) | (0, 15, 363) | 0.05 | 500 | 0.80 | 1.21 | 72.6 | 0.00 | 0.76 | 1.21 | 20.00 | 0.30 |
| small world | full | 3 | 0.1 | (5, 15, 300) | (10, 25, 300) | 0.05 | 500 | 0.68 | 1.35 | 70.8 | 2.50 | 0.68 | 1.35 | 20.00 | 0.50 |
| scale free | disc | 3 | 0.1 | (5, 15, 300) | (0, 10, 975) | 0.05 | 500 | 0.78 | 3.25 | 195.0 | 0.00 | 0.86 | 3.25 | 61.33 | 0.26 |
| scale free | full | 3 | 0.1 | (5, 15, 300) | (10, 20, 300) | 0.05 | 500 | 0.73 | 3.35 | 191.4 | 2.40 | 0.84 | 3.35 | 61.33 | 0.41 |
| random graph | disc | 3 | 0.1 | (5, 12, 167) | (0, 7, 138) | 0.05 | 500 | 0.79 | 0.83 | 27.6 | 0.00 | 0.76 | 0.83 | 9.67 | 0.17 |
| random graph | full | 3 | 0.1 | (5, 12, 167) | (10, 17, 167) | 0.05 | 500 | 0.64 | 0.92 | 26.2 | 1.15 | 0.67 | 0.92 | 9.75 | 0.28 |
| Subgraph type | Connect. type | Layers | StDev. | Nodes per layer | Edges per layer | Subgraph prob. | Sample size | Modularity (top) | Avg. node degree top | Avg edges within communities (top) | Avg. edges between communities (top) | Modularity (middle) | Avg. node degree middle | Avg edges within communities (middle) | Avg edges between communities (middle) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| small world | disc | 2 | 0.1 | (2, 6) | (0, 6) | 0.05 | 500 | 0.50 | 1.00 | 3 | 0.0 | NA | NA | NA | NA |
| small world | disc | 3 | 0.1 | (2, 6, 18) | (0, 6, 24) | 0.05 | 500 | 0.50 | 1.33 | 12 | 0.0 | 0.58 | 1.33 | 3 | 0.20 |
| small world | full | 2 | 0.1 | (2, 6) | (1, 7) | 0.05 | 500 | 0.36 | 1.17 | 3 | 0.5 | NA | NA | NA | NA |
| small world | full | 3 | 0.1 | (2, 6, 18) | (1, 7, 18) | 0.05 | 500 | 0.46 | 1.39 | 12 | 0.5 | 0.55 | 1.39 | 3 | 0.23 |
Figure 1.1: True and predicted adjacency matrices for examples 12 - 14
Figure 1.2: Heatmaps showing HCD prediction performance for examples 12 - 14
Figure 1.3: Correlations on the simulated data and HCD model bottleneck dimension embeddings for examples 12-14